AITopics | epoch-wise double descent

Collaborating Authors

epoch-wise double descent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Double Descent and Overparameterization in Particle Physics Data

Vigl, Matthias, Heinrich, Lukas

arXiv.org Artificial IntelligenceSep-3-2025

Recently, the benefit of heavily overparameterized models has been observed in machine learning tasks: models with enough capacity to easily cross the \emph{interpolation threshold} improve in generalization error compared to the classical bias-variance tradeoff regime. We demonstrate this behavior for the first time in particle physics data and explore when and where `double descent' appears and under which circumstances overparameterization results in a performance gain.

artificial intelligence, double descent, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2509.01397

Country: Europe > Germany (0.15)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback

On the Relationship Between Double Descent of CNNs and Shape/Texture Bias Under Learning Process

Iwase, Shun, Takahashi, Shuya, Inoue, Nakamasa, Yokota, Rio, Nakamura, Ryo, Kataoka, Hirokatsu

arXiv.org Artificial IntelligenceMar-4-2025

The double descent phenomenon, which deviates from the traditional bias-variance trade-off theory, attracts considerable research attention; however, the mechanism of its occurrence is not fully understood. On the other hand, in the study of convolutional neural networks (CNNs) for image recognition, methods are proposed to quantify the bias on shape features versus texture features in images, determining which features the CNN focuses on more. In this work, we hypothesize that there is a relationship between the shape/texture bias in the learning process of CNNs and epoch-wise double descent, and we conduct verification. As a result, we discover double descent/ascent of shape/texture bias synchronized with double descent of test error under conditions where epoch-wise double descent is observed. Quantitative evaluations confirm this correlation between the test errors and the bias values from the initial decrease to the full increase in test error. Interestingly, double descent/ascent of shape/texture bias is observed in some cases even in conditions without label noise, where double descent is thought not to occur. These experimental results are considered to contribute to the understanding of the mechanisms behind the double descent phenomenon and the learning process of CNNs in image recognition.

descent, double descent, texture bias, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-78389-0_7

2503.02302

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards understanding epoch-wise double descent in two-layer linear neural networks

Olmin, Amanda, Lindsten, Fredrik

arXiv.org Machine LearningJul-13-2024

Epoch-wise double descent is the phenomenon where generalisation performance improves beyond the point of overfitting, resulting in a generalisation curve exhibiting two descents under the course of learning. Understanding the mechanisms driving this behaviour is crucial not only for understanding the generalisation behaviour of machine learning models in general, but also for employing conventional selection methods, such as the use of early stopping to mitigate overfitting. While we ultimately want to draw conclusions of more complex models, such as deep neural networks, a majority of theoretical conclusions regarding the underlying cause of epoch-wise double descent are based on simple models, such as standard linear regression. To start bridging this gap, we study epoch-wise double descent in two-layer linear neural networks. First, we derive a gradient flow for the linear two-layer model, that bridges the learning dynamics of the standard linear regression model, and the linear two-layer diagonal network with quadratic weights. Second, we identify additional factors of epoch-wise double descent emerging with the extra model layer, by deriving necessary conditions for the generalisation error to follow a double descent pattern. While epoch-wise double descent in linear regression has been attributed to differences in input variance, in the two-layer model, also the singular values of the input-output covariance matrix play an important role. This opens up for further questions regarding unidentified factors of epoch-wise double descent for truly deep models.

double descent, epoch-wise double descent, inflection point, (17 more...)

arXiv.org Machine Learning

2407.09845

Country:

North America > United States (0.14)
Europe > Sweden > Östergötland County > Linköping (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)

Add feedback

Multi-scale Feature Learning Dynamics: Insights for Double Descent

Pezeshki, Mohammad, Mitra, Amartya, Bengio, Yoshua, Lajoie, Guillaume

arXiv.org Artificial IntelligenceDec-6-2021

A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. By leveraging tools from statistical physics, we study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions for the evolution of generalization error over training. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical experiments where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2112.03215

Country:

North America > United States > California > Riverside County > Riverside (0.04)
North America > Canada > Quebec (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Optimization Variance: Exploring Generalization Properties of DNNs

Zhang, Xiao, Wu, Dongrui, Xiong, Haoyi, Dai, Bo

arXiv.org Artificial IntelligenceJun-3-2021

Unlike the conventional wisdom in statistical learning theory, the test error of a deep neural network (DNN) often demonstrates double descent: as the model complexity increases, it first follows a classical U-shaped curve and then shows a second descent. Through bias-variance decomposition, recent studies revealed that the bell-shaped variance is the major cause of model-wise double descent (when the DNN is widened gradually). This paper investigates epoch-wise double descent, i.e., the test error of a DNN also shows double descent as the number of training epoches increases. By extending the bias-variance analysis to epoch-wise double descent of the zero-one loss, we surprisingly find that the variance itself, without the bias, varies consistently with the test error. Inspired by this result, we propose a novel metric, optimization variance (OV), to measure the diversity of model updates caused by the stochastic gradients of random training batches drawn in the same iteration. OV can be estimated using samples from the training set only but correlates well with the (unknown) \emph{test} error, and hence early stopping may be achieved without using a validation set.

double descent, epoch-wise double descent, variance, (13 more...)

arXiv.org Artificial Intelligence

2106.01714

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(12 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Heckel, Reinhard, Yilmaz, Fatih Furkan

arXiv.org Machine LearningSep-19-2020

This intriguing double descent behavior also occurs as a function of training epochs, and has been conjectured to arise because training epochs control the model complexity. In this paper, we show that such epoch-wise double descent arises for a different reason: It is caused by a superposition of two or more bias-variance tradeoffs that arise because different parts of the network are learned at different epochs, and eliminating this by proper scaling of stepsizes can significantly improve the early stopping performance. We show this analytically for i) linear regression, where differently scaled features give rise to a superposition of bias-variance tradeoffs, and for ii) a two-layer neural network, where the first and second layer each govern a bias-variance tradeoff. Inspired by this theory, we study two standard convolutional networks empirically, and show that eliminating epoch-wise double descent through adjusting stepsizes of different layers improves the early stopping performance significantly.

artificial intelligence, descent, machine learning, (17 more...)

arXiv.org Machine Learning

2007.10099

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback